A Few Graph-Based Relational 
Numerical Abstract Domains* 



Antoine Mine 

Ecole Normale Superieure de Paris, France, 
mineOdi . ens . f r, 
|http : / / www . di . ens . f r/~mine 



Abstract. This article presents the systematic design of a class of rela- 
tional numerical abstract domains from non-relational ones. Constructed 
domains represent sets of invariants of the form (vj — Vi £ C) , where Vj 
and Vi are two variables, and C lives in an abstraction of V(Z), "P(Q), or 
"P(R). We will call this family of domains weakly relational domains. The 
underlying concept allowing this construction is an extension of potential 
graphs and shortest-path closure algorithms in exotic-like algebras. 
Example constructions are given in order to retrieve well-known domains 
as well as new ones. Such domains can then be used in the Abstract 
Interpretation framework in order to design various static analyses. A 
major benefit of this construction is its modularity, allowing to quickly 
implement new abstract domains from existing ones. 



1 Introduction 



Proving the correctness of a program is essential, especially for critical and em- 
bedded applications (such as planes, rockets, and so on). Among several correct- 
ness criteria, one should ensure that a program can never perform a run-time 
error (divide by zero, overflow, etc.). A classical method consists in finding a 
safety invariant before each dangerous operation in the program, and checking 
that the invariant implies the good behavior of the subsequent operation. Be- 
cause this task is to be performed on the whole program — containing maybe 
tens of thousands of lines — and must be repeated after even the slightest code 
modification, we need a purely automatic static analysis approach. 

Discovering the tightest invariants of a program cannot be fully mechanized 
in general, so we have to find some kind of sound approximation. By sound, we 
mean that the analysis should find an over-approximation of the real invariant. 
We will always discover all bugs in a program. However, we may find false alarms. 



* This work was supported in part by the RTD project IST-1999-20527 "DAEDALUS" 
of the European 1ST FP5 program. 



2 Previous Work 



We will work in the well-known Abstract Interpretation framework, proposed 
by Cousot and Cousot in [HE], which allows us to easily describe sound and 
computable semantics approximations. 

2.1 Numerical Abstract Domains 

The crux of the method is to design a so-called abstract domain, that is to say, a 
practical representation of the invariants we want to study, together with a fixed 
set of operators and transfer functions (union, intersection, widening, assign- 
ment, guard, etc.) used to mimic the semantics of the programming language. 

We will consider here numerical abstract domains. Given the set V of the 
numerical variables of a program with value in the set I (that can be Z, Q or R), 
a numerical abstract domain will represent and manipulate subsets of V i— ► I. 
Well-known non-relational domains include the interval domain [5] (describing 
invariants of the form £ [ci,C2]), the constant propagation domain (uj = c), 
and the congruence domain [14j (vi £ aZ + b). Well-known relational domains 
include the polyhedron domain [S] (a%vi + ■ •■ + ct n v n < c), the linear equality 
domain |18j (a±vi + ■ • ■ + a n v n = c), and the linear congruence equality domain 
[15] (ot\V\ + • ■ • + a n v n = a [&]). 

Non-relational domains are fast, but suffer from poor precision: they cannot 
encode relations between variables of a program. Relational domains are much 
more precise, but also very costly. Consider, for example, the simple program of 
FigurejT] that simulates many random walks and stores the hits in an array. Our 
goal is to discover that, at program point (•), x is in the set {—5, —3, — 1, 1, 3, 5} of 
allowed indices for hit, so that the instruction hit[x]++ is correct. The invariants 
found at (•) by several methods are shown in Figure [5J Remark that, even if 
the desired invariant is a simple combination of an interval and a congruence 
relation, all non-relational analyses fail to discover it because they cannot infer 
the relationship between x and i at program point (-A - )- It is often the case that, 
in order to find a given invariant at the end of a loop, one must be able to express 
invariants of a more complex form inside the loop. In this example, the desired 
result can be obtained by using relational analyses, as shown in FigureO 

2.2 Graph-Based Algorithms 

Pratt remarked in [55] that the satisfiability of a set of constraints of the form 
(x — y < c) can be efficiently tested in Z, Q, or R by looking at the simple loops 
of a directed weighted graph — so-called potential graph. Shostak then extended 
in [57] this graph-based algorithm to the satisfiability of constraints of the form 
(ax + j3y < c), in Q or R. Harvey and Stuckcy proved in [T7j that Shostak's 
algorithm can be used to check satisfiability of constraints of the form (±x ± y < 
c) in Z. These approaches focus only on satisfiability and do not address the 
problem of manipulating constraint sets. 



hit: array {-5,-3,-1,1,3,5} i— > int; 
for k=l to 1000 do 

x=0; 

for i=l to 5 do 

(~k) if random() then x++; else x--; 
done; 

(•) hit[x]++; 
done 



Fig. 1. A simple random walk program, and its control flow graph. 
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Fig. 2. Invariants discovered for the program in Figuref!] at program points (•) 
and using several non-relational (left) and relational (right) analyses. 



Using Pratt's remark, the model-checking community developed a struc- 
ture called Difference-Bound Matrix (DBM, for short) and algorithms based 
on shortest-path closure of weighted graphs to represent and manipulate con- 
straints of the form (x — y < c) and (x < c). DBMs are used to model-check 
timed-automata. In [28], Toman and Chomicki introduced periodicity graphs that 
manipulate constraints of the form (x = y + c [k] ) , and apply this to constraint 
logic programming and database query. 

Unlike model-checking and constraint programming, we would like to analyze 
generic programs, and not simply systems closed under restricted constraint 
forms — such as timed automata, or database query languages. Our methodology 
is first to choose an invariant form, and then to design a fully-featured abstract 
domain (including guard and assignment transfer functions, as well as a widening 
operator) allowing to discover invariants of this form on any program, using 
maybe coarse over-approximations for those semantics functions that cannot be 
represented exactly using the chosen invariant form. 

In [23] , we already presented a DBM-based abstract domain allowing to dis- 
cover invariants of the form {x~y < c) and [x < c). In [23], we presented a slight 
extension, called the octagon abstract domain, allowing to discover invariants of 
the form (±x ±y < c) . 
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2.3 Our Contribution 



Our goal is to propose a new family of numerical abstract domains, based on 
shortest-path closure algorithms, that allows to discover invariants of the form 
(x — y G C), where C lives in a non-relational domain. This family generalizes the 
DBM-based abstract domain of [53] and allows us to build new domains, such as 
the zone congruence domain that discovers invariants of the form {x = y + c [k] ) . 
Such relational domains are between, in term of cost and precision, non-relational 
domains and classical relational domains — such as polyhedron or congruence 
equality Thus, we will call these domains weakly relational domains. 

We claim that such domains are useful as they give, on the example of Fig- 
urc[l] almost the same result as the relational analyses, for a smaller cost. Do not 
be fooled by the simplicity of this example program; the abstract interpretation 
framework allows the design of complex intcr-proccdural analyzes [1] adapted to 
real-life programming languages. A numerical abstract domain is just a brick in 
the design of an analysis; it can be plugged in many existing analyses, such as 
pointer [TO] , string cleanness [TT] , termination analyses [3J , analyses of mobility 
[T2] . probabilistic programs [55], abstraction of tree-based semantics [3TJ, etc. 

The paper is organized as follows. Section 3 reformulates the construction of 
non-relational numerical abstract domains using the concept of basis. Section 4 
explains our generic construction of weakly relational domains and applies it in 
order to retrieve the zone domain and build new abstract domains. Section 5 
provides a few applications and ideas for improvement. We conclude in Sec- 
tion 6. Important proofs are postponed to the Annex; reading them may help to 
understand the definitions chosen in Sections 3-4 (mainly Hypotheses [JJ . 

3 Bases and Non-Relational Numerical Abstract Domains 

This sections first recalls the concept of numerical abstract domain. We introduce 
the new concept of basis and show how it can be used to retrieve standard non- 
relational domains. Introducing such a concept only for this purpose would be 
formalism for the sake of formalism. However, we will show in the next section 
how to use this concept to build our weakly relational domains. Hence, bases 
are the common denominator between classical non-relational domains and our 
weakly relational domain family. 

From the implementation point of view, bases are modules sharing a common 
signature, and we propose one functor for building non-relational domains from 
this signature, and one functor for building weakly relational domains. From the 
mathematical point of view, this approach makes our proofs modular and easier 
to handle. 

3.1 Semantics and Abstract Domains 

Let P be a procedure- free, pointer-free program such as the one in Figure [TJ Let 
V = {vq, . . . , vn-i} be the set of its numerical variables, with values in the set 



I (that can be Z, Q, or K). We attach to each node of the control flow graph of 
P a set of environments G T$ = P(I Ar ) that maps each variable to its value. 
The information is propagated using the following equations: 

• guards, corresponding to tests in the initial program, filter the environ- 
ment: e|j expr ?) = { (.t , . . . ,xn-i) S | expr(a; , . . . ,xn-i) holds }; 

• assignments change the value of one variable: 

e (t>i<-«pr) = { ( X 0, ' ■ ' ' eX P T ( X 0, ■■■> EJV-l)) • ■ •) I ( X 0, ■ ■ ■ , ■ ■ ■) £ ^ }; 

• union U collects environments at control flow joins. 

Because of loop constructs, the control flow graph contains loops and the sys- 
tem of equations described above is recursive. Classical safety semantics consider 
the least fixpoint solution. 

This semantics is not decidablc in general. One thus constructs an abstract 
domain [6J which is a computer-representable partially ordered set {T>, -<) con- 
nected to (T>\ C) by a monotonic concretization function r. Guard, assignment, 
and union operators have sound over-approximations in X>, that is to say: 

^(e ( ex P r ?)) 2 (^(e)){ expr ?) ; 

^(e^^expr)) 2 (J»)|^ expr) ; 

r(euf) d r(e)ur(f) . 

Unlike classical data-flow analysis, V can have an infinite height, so one needs 
a widening operator j6j to compute, in finite-time, an over-approximation of least 
fixpoints. The widening operator V should have the following properties [B]: 

Definition 1. Widening. 

1. Vx, y € C, x < x V y, and y -< a; V y. 

2. For every increasing sequence (y n )neN, the sequence (x„)„ e N defined by 

{is ultimately stationary. (Ascendinq chain condition.) 
x n+1 = x n V y n , y y i y / 



The least fixpoint lfp^i 7, of an abstract operator F is replaced by the limit 
of the stationary sequence Xq = _L, = Xi V F(Xi) — see [2] for more infor- 

mation on when and how to use widenings. 

It is a major result of abstract interpretation that, when computing in the 
abstract domain with widenings, one obtains, in finite time, a sound over- 
approximation of the initial semantics. 

3.2 Bases 

We call basis a structure that represents and manipulates subsets of I in a way 
suitable to build a non-relational abstract domain. Such bases will be then used 
in the following section to build our family of relational domains. It is given by: 



Definition 2. Basis. 

1. A computer-representable set C with partial order C and least element _L. 

2. A strict, monotonic, injective concretization 7 : C V{f). 

3. Each element CCI has an over- approximation C" E C: y(C") D C. 
^. There exists an over-approximation n /or i/ie intersection: 

7(cfnc»)D 7 (cf)n 7 (C») . 

5. There exists an upper bound U: C*, C* C Cf U CJj. 

i?ac/i k-ary arithmetic expression expr k (c\, . . . , C&) ftas an abstract over- 
approximated counterpart escpr|(C|, . . . , Cji): 

7(ex P 7i(C{, . . . , D { expr^ci, . ..,<%) | c 4 e T (Cf) } . 

7. // C has strictly infinite chains, there is a widening operator V . 

By strictness, 7(-L) = 0. Thanks to points 2 and 3, there exists a unique 
abstract element T such that 7(T) = I. The least upper bound U is also an 
over-approximation for the union: j(C[ U C|) 2 l(C\) U ^(C^)- 

3.3 A few Classical Bases 

We now present a few set of bases that allow us to retrieve the non-relational 
constant propagation [6] , interval [5] , and congruence domains |14j . 

Constant Basis. C cst = {_L, T} U { c" | c e I }. 

All abstract operators are straightforward and not discussed here (see [S] for 
more details). There is no need for a widening operator. 

Interval Basis. C [a>b] = {±}U{ [a, 6] | a G lU{-cx)}, b E lU{+oo}, a < 

Most abstract operators are straightforward (see [5] for more details). We 
will only recall here the classical widening operator: 

J ai if a\ < a-i J 61 if b\ > bi 
y — 00 elsewhere ' ]_ +00 elsewhere 

In Q or K, one can alternatively define the open interval lattice Ci m the same 
way. One can even combine these informations to obtain a basis Cj a ,&i where each 
bound may or may not be included. 

Congruence Basis. Cf lZ+b = {±} U { (aZ + b) \ a E N* U {00}, b E Z }. 

A basis is built on Cf z+f) thanks to the operators described in Figure [4] (using 
the definitions of Figure [3]). However, for the sake of conciseness, Figure |4] docs 
not present abstract fc-ary arithmetic expressions, but the binary plus, which is 
denoted by the infix EH operator (see [14] for more details). There is no strictly 
increasing infinite chain, so there is no need for a widening operator. 

One may also consider to adapt the definitions of Figures [3] and 0] to rational 
congruences [16]: C® z+b = {±} U { (oZ + b) \ a E Q >0 U {00}, b E Q }. 

1 Bounds are part of the interval only if finite. Do not be confused by closed interval 
notations such as [a, +00] ; the interval cannot contain infinite elements. 



[ai,6i] V [a 2 ,6 2 ] = 



In the following, x,x' £ Z and y, y £ N* U {oo}: 

• y/y 1 <=> y is a divisor of y' (3k £ N* such that y = ky), or y' = oo; 

• x = x' [y] ■<=> x 7^ x' and y/\x — x'\, or x = a:'; 

• V is the least common multiple, extended by y V oo = oo V y = oo; 

• A is the greatest common divisor, extended by y A oo = f oo A y = y. 

Fig. 3. Classical arithmetic operators extended to N* U {oo}. 



3.4 Building Non-relational Domains from Bases 

Building a non-relational domain (T>, :<) from a basis (C, C) is straightforward: 

• Wc set V = V h-> C. 

• The concretization _T, order ^, union U, and widening V arc simply 
point-wise versions of the corresponding operators on the basis. 

• Assignments arc defined using the abstract counterpart of expressions: 

(Cj, . . . , df, . . -)( Vi i-expr) = ■ ■ ■ , &xpr^ N (C\, . . . ,C^ N _ l ), . . .) . 

• Only non-relational guards (vi G C ?) do some filter job: 

(C(,..., C\, . . .) {Vt£C ?) d = f (Cf , . . . , C\ n C», . . .) where T (C«) D C . 
In other guard cases, it is safe to use the identity function. 

From an implementation point of view, the non-relational domain is simply 
a generic functor module, and each basis implementation is a module. 



4 Building Weakly Relational Domains from Bases 

Now wc would like to represent relations of the form Vj — Vi G 7(C) where C 
lives in a basis C (instead of vi G 7(C)). A plain basis is not sufficient, we will 
need a way — a so-called closure — to propagate relational information. The main 
result of this paper can be schemed as follows: 

basis . weakly relational 

closure ^ 

(with extra hypotheses) domain 



4.1 Hypotheses on the Basis 

Not all bases C are acceptable. We need the following extra hypotheses: 

Hypotheses 1. Acceptable Bases. 

1. There exists exact abstract counterparts for the intersection n (which should 
also be a lower bound for \—), unary minus B, and binary plus EH operators: 

. 7(2; 5iy) = {a + b\ a<E "f{x), b G 7(2/) }; (Abstract plus.) 

. j(Bx) = { —a I a G "f(x) }; (Abstract opposite.) 

• x n y C x, y, so 7(2; 17 y) — j(x) n 7(2/)- (Abstract intersection.) 



• Concretization: 

( {ak + b\keZ} if C = {aZ + b), a / oo; 

7(C) = \{b} if C= (ooZ + b); 

{ if C = _L. 

• Order: 

. (aZ + b) C (a'Z + b') 44 a'/a and b = b' [o']. 
. ICC, VC6C. 

• Intersection (exact abstract counterpart for the intersection C\): 

. (oz + b) n (a'Z + b') d = f ( f v ffl ') z + h " * h = b ' [ fl A a 'l- 

I _L elsewhere, 
where b" is such that b" = b [a V a'] = b' [a V a'] (Bezout Theorem). 

. inc = cnl = _l, vcgC. 

• Least Upper Bound: 

. (aZ + b)u(a'Z + b') = (a A a' A |b - b'|) Z + min(b, b'). 

. luc = c* u ± = c, vc e c. 

• SWi (exact abstract counterpart for the binary + operator): 



(aZ + b) ffl (a'Z + b') = (a A a') Z + (b + b'). 
_L ffl C* = C EB 1 d = _L, VC G C. 

Fig. 4. Concretization and abstract operators in Cf 2 



U. i?ac/i singleton {c}, c 6 I mwsi be exactly represented by an abstract element 

S G C: 7(c s ) = {c}. 
5. For eac/i finite family {xi)i e j, 

~~| Xi = _L ==> Eli, j G /, x,; l~l Xj = _L . 
^. l~l distributes EH: /or eac/i family (xj)jgj arid element x of C, 

if n - t ' ^ ri( x ffl ^ = ^ ffl ( n ^ ) • 

te/ ie/ Kiel ) 

5. B distributes ffl and n. ffl and n are commutative and associative. >h 

These hypotheses were stated in order to prove our main theorem, which 
is the correctness of the closure operator. Thus, one may have to wait until 
Theorem|6] — and its proof postponed in Annex A — Remark that Hypotheses Q] 3- 
4 are very strong. The full basis C = V(t), for instance, does not respect them. 



Remark. There exists resemblance between bases respecting Hypotheses Q] and 
the graph-theory classical notion of complete dioid [13] — an extension of exotic 



algebras. A complete d'foid is a complete semi-lattice with an addition (our n), 
and a multiplication (our EE) that distributes over the addition. However, full 
distributivity in d'foids implies that _L EB T = T where we would have preferred 
_LEET = _L. Thus, in our framework, distributivity is restricted (Hypothesis [T]4). 

4.2 Representing Relations 

A set of constraints of the form Vj — G 7(C), C G C is now represented by a 
coherent constraint matrix: 

Definition 3. Constraint Matrices. 

1. A constraint matrix m is a N x N matrix with elements in C; the element 
m.y represents the constraint Vj — i>i G y(xxiij). 

2. We suppose, as an implicit constraint, that vq = 0, so that unary constraints 
Vi G 7(moj) can be represented as vt — vq G 7(nioi). 

3. m is coherent ifVi,j, = B andVi, -f(m.a) = {0}. 
4- m represents the set (so-called concretization ofm): 

r(m) = { (x , . . . ,x N -!) G I N I x = 0, Vi, j, xj -s, G 7( m tj) } • 

Our abstract domain is the set T> of coherent constraint matrices, ordered by 
the point-wise extension ^ of the partial order C on C: 

m < n Vi,j, m,j C n^; 

m = n Vi,j, niy = mf, 

_L = inf^ T> is such that Vi, j, _Ly = _L . 

The concretization function on T> is T ', and we have: 

Theorem 4. Monotony of F. 

i. m ^ n =^> r(m) C r(n). 

m = n =^> r(m) = T(n). □ 

However, this is not an equivalence and we can have two different constraint 
matrices m ^ n with the same concretization r(m) — r(n). 

4.3 General Closure Operator 

Implicit Constraints. Because our abstract domain is relational, the con- 
straints between variables are not independent. One can deduce a constraint on 
x — z by adding a constraint on x — y to a constraint on y — z. Such deduced 
constraints are called implicit constraints because they may not be present ex- 
plicitly in m. More generally, given any path {i = i\, . . . , i n = j) in m, we can 
construct the following implicit constraint: 



Vj-Vi G 7 (EELi 1 * 11 ''! ij+i) 



m° 


dcf 


m; 


m r 

m* 


dcf 
dcf 


n 

N 

m 



Shortest-Path Closure. A nice property of DBMs [H] and periodicity graphs 
[28] that will hold for our constraint matrices is that the concretization is entirely 
determined by the set of implicit constraints of the above form. DBMs use any 
shortest-path closure algorithm in order to make all implicit constraints explicit. 
Here, we adapt the Floyd-Warshall algorithm [4J §25.3], to our matrices. 

Definition 5. Closure. Let m be a coherent matrix. Its closure is the result 
m* of the following modified Floyd-Warshall algorithm: 

n(m* fc fflmjk); 



The Floyd-Warshall algorithm was chosen because it is easy to understand, 
straightforward to implement, and easy to adapt to constraint matrices. It per- 
forms 0(N 3 ) elementary basis operations. 

Here is the main theorem of this paper. The following results will be used 
extensively in Section l4~4l in order to design abstract operators. The proof of this 
theorem relies heavily on Hypotheses [I] — in fact, the proof itself motivated the 
hypotheses. 

Theorem 6. Closure. 

1. r(m*) = r(m). 

2. r(m) = <S=^ 3i, m* = 1. 

3. If r (m) 7^ 0, then m* enjoys the following properties: 

. m* is a coherent matrix; ( Coherence.) 

■ Vi,i, m* = ri(i=i 1 ,... 1 i n =j) fflr=i lm <i (Transitive closure.) 

. Vi, j, Vc e 7(m*), 3(xq, . . . , xn-i) € r(ia), Xj — Xi = c; (Saturation.) 

. m* = inf^{ n | r(m) = r(n) }; (Normal form.) 

. m** = m*. (Closure.) 

□ 



Incremental Closure. When modifying slightly a closed matrix, we do not 
need to perform the modified Floyd-Warshall algorithm completely to get the 
closure of the new matrix. If the upper-left M x M sub-matrix of m is already 
closed, we can use the following 0((N - M) ■ TV 2 ) algorithm: 

o def 
i u = m; 

m^ +1 = m* if i,j,k < M; 

m^" 1 = my n (m*j. ffl mj .) elsewhere; 

i = m . 

We can adapt easily the algorithm — permuting variables — to get a general in- 
cremental closure algorithm performing 0(N 2 ■ c) elementary basis operations, 
where c is the number of lines and columns that have changed since the last 
closure. 



4.4 Generic Operators 



Emptiness Testing. Testing the satisfiability of a constraint matrix is done 
using Thcorem[5j2. Unlike the constraint programming approach, we do not use a 
specific loop-based satisfiability algorithm, but let our generic closure algorithm 
solve both the satisfiability and the normal form problems at once. 

Equality, Inclusion Testing. The normal form property of Thcorcm[5l3 allows 
us to easily test equality and inclusion of non-empty concretizations: 

Theorem 7. Equality and Inclusion Testing. 

1. r(m)=r(n) m*=n*. 

2. T(m) C T(n) <S=^> m* ^ n. □ 
Remark that we do not need to close the right argument while testing inclusion. 

Union, Intersection. 7(C) is stable under intersection, so we simply extend 
point- wisely l~l to represent the intersection of two concretizations: 

[m n n] y = m.ij n n„ . 

Theorem 8. Intersection. _T(m fi n) = _T(m) n -T(n) . □ 

7(C) is not generally closed under union, neither is r(V). However, if there exists 
an upper bound U in C, we can extend it point-wisely in T>: 

[m U n], y - = m,, U n y - . 

If U is a least upper bound, U can be used to determine the least upper bound 
of two concretizations, provided the arguments are closed matrices. 

Theorem 9. (Least) Upper Bound. 

1. IfWa,b e C, 7(0 U 6) D 7(a) 1)7(6), 

then r(m U n) D T(m) U r(n). (Upper bound.) 

2. 1/7(0 U 6) = infc{ 7(c) I 7(c) D 7(a) U 7(6) }, then 

T(m* U n*) = infc{ r(o) | r(o) D r(m) U T(n) }. (Least upper bound.) 

3. (m* LJ n*)* = m* U n*. ( U respects closure.) 

□ 



Widening. T> has infinite strictly increasing chains only if C has. A widening 

V is obtained on T> by point-wise application of the widening V on C: 

[m V n] y - = my V n. tJ . 

V respects Definition [T] Thus, the least fixpoint of an operator F can be over- 
approximated by the limit of the stationary sequence Xi + i = Xi V F(Xi). One 
could expect, as for the least upper bound, to get a better precision by closing 
the arguments of V , but this is not the case. Even worse, enforcing the closure 
of the chain by computing -Xj+i = {Xi V F(Xi))* breaks the ascending chain 
condition and prevents the analysis from terminating in some cases. We advocate 
here the use of the following iteration: X; + i =X,V F(X*). 



Guard. We can easily implement tests of the form (vj — Vi £ C ?): 

fm H nC» if(M) = (M); 

K-^ £0?) ] H = 4 m fei □ (BC») if (M) = 

[ mfe; elsewhere; 
choosing C s such that 7(C B ) 2 C . 

Tests of the form (uj G C ?) are implemented by choosing i = 0. 
For other tests, it is safe to do nothing: 

dcf 

m( ? ) = m . 



Projection. In order to find the set of values that a variable can take, we use 
the following theorem derived from the saturation property of the closure: 

Theorem 10. { x \ 3(x , . . . ,xn-i) £ -T(m) with Xi = x } = 7(m*) . □ 



Forget. Forgetting the value of a variable is useful to implement the random 
assignment (vi «— ?), which also serves as a coarse approximation for complex 
assignments. Before forgetting all information on a variable, one should close 
the argument matrix so that we do not loose implicit constraints: 

r i dcf J T if k = i or I = i; 

I b>i^-l)\ kl — I m * elsewhere . 

Theorem 11. 

r(m ( ^^ ?) ) = { (x Q , ...,Xi,...) | 3x, (x , ...,x,...)e r(m) } . □ 



Assignment. For assignments of the form (vi 
abstract counterpart: 

f m kl ffl {c} 

[m (t ,. < _„ i+c) )] fc; = I m fc( ffl{-c} 



- Vj + c) , one can find an exact 

if k = i and / i; 
if I = i and k i; 
elsewhere; 



dcf , . 



when i j 



For generic assignments (vi <— expr(t>i, . . . , v^^i)), one can always fall back 
to imprecise non-relational analysis, first projecting the variables, then using the 
abstraction expr" of expr in our basis: 

dcf , , 

m(„ i ^expr(D 1 ,..., V N -!)) - { m (v^?))( Vi £~/(C«) ?) 

where C B = expr»(m*, . . . , m*^^) . 

Trying to be the most precise in all cases may lead to complex algorithms. 
It seems only worth trying to be a little more precise in some widespread cases, 



such as (vi 



m 



"3 

def 



Vk), for instance: 



(v^vj+vk) - (m(^^_ ?))(„. e7(m * fflm * ) 7) ( t)i - Uj . e7 ( m * ) ?) (Vi-v k £i(m* ) ?) 



Interaction with the Closure. Some of the above operators require the matrix 
argument(s) to be closed. Some do respect closure — the result is closed if the 
argument(s) is(are) — and some do not (intersection, guard, assignment, etc.). 
We thus advocate the use of a lazy method that remembers when a matrix is 
in closed form, and recomputes the closure only when needed. When only a few 
lines and columns of the matrix are changed (guard, assignment, etc.), we can 
use the incremental closure. It is useless when all coefficients are changed at once 
(intersection, widening). 

4.5 Some Constructed Domains 

We are now ready to apply our construction to the bases presented in Section[3T31 
thanks to the following theorem: 

Theorem 12. C cst , C [a ^ b] , C^ z+b , and C% +b respect Hypotheses^ □ 

Translated Equality Domain. The simplest domain is obtained from the 
constant basis C cs ^ and represents constraints of the form (vi = Vj + c). This 
domain is not of great practical interest: its expressive power is low as it is a 
particular case of the following two domains. It is possible that more efficient 
solutions exist, as we are not very far from simple equality constraints Vi = Vj for 
which very efficient algorithms are known (such as, the Union-Find algorithm 
01 §22]). 

Zone Domain. In order to represent invariants of the form (vi — Vj < c), one 
can think of the basis of initial segments { ] — oo, a] \ a S I U {+oo}}, but initial 
segments are not closed under the B operation (Hypothesis [T]l). Completing 
this basis, one naturally find the interval basis C[ a ,6]- 

Compared to classical DBMs [23], the domain obtained is a little redundant 
(each constraint is represented twice), but has exactly the same expressiveness 
and complexity. It has the advantage of being implemented over any existing in- 
terval library, greatly reducing the need for programming. One can also enhance 
the zone domain in Q and R using the Cja,^ basis that manipulates both strict 
and non strict constraints. 

Zone-Congruence Domain. Using the integer congruence basis C^ z+b , one 
builds a domain that discovers constraints of the form (vi — Vj = a [b]). This 
construction looks like periodicity graphs |28] , but we treat here the case of least 
upper bound and general purpose transfer functions in detail, whereas |28j is 
only interested in satisfiability, normal form and conjunction. Moreover, we feel 
that [25] misses the correct proof of the normal form theorem (our Theorem [5]) 
and does not understand that it relies on some strong properties of congruence 
sets (Hypotheses [TJ. Our framework can also extend this domain to a domain of 
rational congruences: (vi — Vj = a [b]) with a,b G Q. 



Product Domain. Reduced product is a well-known technique [5] for improving 
the precision of an analysis by combining the power of two abstract domains. It 
often gives better results than two separate analyses, because it conveys informa- 
tion from one domain to the other during the analysis via a so-called reduction 
procedure, which is a couple of binary operators (O, 0) such that: 



CiOC 2 <i C r , 

o cl) n h(Ci o c 2 ) = A(Ci) n r 2 (c 2 ) 



In our case, the reduction can be defined on bases — as long as Hypotheses Q] 
are not broken — with the exact same precision benefit. Moreover, reductions are 
easier to design on non-relational bases. For example, if we use the following 
reduction between C[ 0) &] and Cf z+f) , we obtain a basis allowing the construction 
of a domain for constraints of the form (vi — Vj G a ■ [6, c] + d): 

[a, b] O (cZ + d) = [ min{x G (cZ + d) \ x > a}, 

max{i G (cZ + d) | x < 6} ]; 
[a, 6] O (cZ + d) = (cZ + d) . 



Failure. So far, all seems to work well. However, one can find some bases used in 
very common abstract domains that do not respect Hypotheses [1] For example, 
the sign basis [6] C± = { _L, ] — oo, 0], [0,0], [0, +oo[, ] — oo, +oo[ } and the open 
interval basis C] a u do not respect Hypothesis [1] 2. The interval congruence basis 
C a z+[b,c\ (introduced in [20]) does not respect Hypothesis Q] 1 (it is not stable 
under intersection). We do not know if it is possible to build weakly relational 
domains from such bases. 



Modularity. As for non-relational domains, the weakly relational domain fam- 
ily is simply a generic module functor, taking the very same bases implementation 
modules as parameter. 



5 Applications and Future Work 

Applications. So far, this framework has only been implemented as an OCaml 
prototype and tried on a few toy examples. At program point (•) of the program 
in Figure[Tl the reduced-product of the zone and zone-congruence domains found 
the invariant (x < 5, x = 1 [2]), which is almost as good as polyhedron and 
congruence equality analyses combined (Figure [2]). It failed to discover that (x > 
—5); however, the octagon abstract domain of [24] that also uses graph-based 
algorithms can do it. 

If in the program of Figure [T] the constant 5 is replaced by a variable m the 
value of which is not known at analysis time, the analyzer still finds the precise 
symbolic invariant (x < m, x = m [2]). 



Scalability. It is still unknown whether graph-based abstract domains scale up. 
Because of the quadratic memory cost, it cannot handle all the variables of a 
large program at once; one has to split this set into packets in which relational 
information might be important. These packets do not need to be disjoint, and 
one can use pivot variables to transfer information between packets. We are 
currently investigating on such methods. 

Because our domain family is relational, it is also adapted to symbolic and 
modular analyses. One can cut down the cost of an analysis and make it incre- 
mental by analyzing separately small pieces of a program [8] . 

Theoretical Extensions. We tried, in this article, to unite some graph-based 
numerical satisfiability algorithm and extend them up to an abstract domain, 
in a united framework. However, a few graph-based algorithms are not handled 
here: the octagon abstract domain [21] (±x ± y < c constraints) and Shostak's 
satisfiability algorithms [57] (ax + (3y < c). It would be interesting to unite 
all those in a general framework and derive a numerical abstract domain for 
constraints of the form (ax + (3y < c). 

6 Conclusion 

In this paper, wc have proposed the systematic construction of a family of rela- 
tional domains that represent and manipulate constraints of the form (x—y £ C). 
This construction can be seen as a functor lifting non-relational domains to rela- 
tional ones. The memory cost of an abstract state is quadratic, and each transfer 
function application performs, at worse, a cubic number of operations in the non- 
relational domain. The crux of the method is the adaptation of the shortest-path 
closure algorithm to a normal form, allowing the derivation of most abstract op- 
erators and transfer functions. 

In this framework, we have successfully retrieved the existing DBM domain, 
and constructed new ones. It is the author's opinion that these domains fill 
a precision and complexity gap between former non-relational and relational 
domains, and can be used to design medium cost, yet precise, analyses. 
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C. Hymans, as well as the anonymous referees, for their useful comments. 
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Appendix 

A Proof of the Main Theorem 

Wc present here the complete proof of the main theorem, Theorem [6j It is the 
proof of this theorem that motivated the choice of Hypotheses [TJ 

Remark that the proof of this theorem is much simpler in the special case of 
the interval basis Cr w (see Theorem 2 in the author's master thesis [2"2"]). 

Remark also that part of this theorem for the congruence case Cj z+b is dis- 
cussed by Toman and Chomicki in [28], but the proof is somewhat eschewed 
(Lemma 2.12). Our proof relies heavily on the fact that Cf z+b verifies Hypothe- 
sis [T]3, which is not trivial. 

Proof of Theorem [6] 

• Claim: r(m*) = T(m). □ 

We have Vk,i,j, m,y +1 = m^ n (m k k EH mjy) C m^ ( Hypothesis 1), so 
Vfc, r(m fc+1 ) C r(m fe ). Conversely, Vi,j,k, (x , . . . ,x N -i) E r(m k ), we 
have Xk — Xi <E 7(111^), and xj — Xk G 7(mjy). By summation, Xj - i, € 
7(mf fe fflm^) (Hypothesis^ 1). ThnsXj-Xi G 7(m^ +1 ), and Vfc, T(m fc ) C 
r(m k+1 ) From these two inequalities, we deduce Vfc, r(m k+1 ) = r(m k ), 
so r(m*) = P(m). ■ 



• Claim: if -T(m) ^ 0, then m* is coherent □ 

Proof. Suppose that m is coherent. We first prove that Vi, j, m* = Bm* 
By recurrence, one would prove that Vfc,z, j, = B m^ 1 using the 

identity B(m^n(m£ fe fflm£.)) = (Bmf J )n((Bm,f fc )B(Bm^.)) (Hypothesis 

ms). 

Now, we know that Vi, m* C m ?i , so Vi, 7(111*) C 7(mji) = {0}. If for 
some i, 7(111*) C {1}, then 7(111*) = and, obviously, ^(m*) = 0. This 
contradicts the fact that -T(m) ^ because of the proceeding point. 

• Lemma 1: for any fixed < i,j < N — 1, < fc < iV, and path 
{i = i±, . . . , i n = j) in m such that ii < k for 1 < I < n, and i s 7^ i t for 
1 < s < t < n, we have C Eg r=Ti lm «i h+i • ^ 



Corollary. Applying this this lemma with k = N, we get: for all simple 
paths (i = h, . . . , i„ = j), m* C ^i^i, ■ □ 

Proof. By recurrence. The property is obvious for k = as it is equivalent 
to E m.ij and we have m° = m. Suppose that the property is true for 
a k < N and let (i = i\, . . . , i n = j) be a path satisfying the hypotheses of 
the lemma for k+1. If V/ G {2, . . . , n — 1}, %i < k, the property is true by 
recurrence hypothesis and because m^ 1 E m^ . On the contrary, if there 
exists a I such that i; > k, we know that it is unique and that i; = k. By 
definition of m fe+1 , we have m^ +1 E m*- fc ES m kj- We obtain the expected 
result by applying the recurrence hypothesis to (i = i\, . . . , i\ = k) in m*" fc , 
and to (k = ii, . . . ,i n = j) in mjy , and using the associativity of EEL ■ 

Lemma 2: if, for some < i, j < N, 

7(ni<„, < i = il ,..., i „= J > fflr=> itii+ J - 0, then r(m) = 0. □ 

Proof. Suppose that 7(n< n , < i=ill ...,;„=j) fflr=i lm i(i !+ i) = ^ but 
P(m) 7^ 0. Take some (cc , . . . , xjv-i) £ P( m )- For any 
path (i = ii,...,i n = j), we have V7 G {1, ...,n — 
1}, - x it e 7(m i|il+I ). By summation — a;* G 

7(fflr=i lm i ( ^+i)- Thus x j ~ x i e rii<„, ( l=ll ,..., 4 „ =J )7(fflr=i lm M i+ i) = 

7(rii<„ (i=ii i„=i> EB "=i lm ii ii+i ) (HypothesisUil), which is not empty. 
■ 

Lemma 3: if VO < i,j < N, 7 (rii<„, {i=i 1 ,...,i n =J) EB 7=i™h h + J * ^ 
then VO < i,j < N, < k < N, f\i< n , { t =n,...^ =J ) EB ^"H E m* . 
□ 

Corollary. When we set k = N in the lemma, we get 

vi,i, n (l=n ,..., 4 „ =J) fflr=>i,i I+1 e m* □ 

Proof. By recurrence. If /c = 0, then we have m^- E m^- because m° = m, 
so a fortiori the lemma is true. Suppose that the property is true for 
k < N. To prove the property for k + 1, we only have to prove that 

n^, ...,,„=,■> fflr=iW* +1 e (< ;••<)• 

By anti-monotonicity of C in 7>(C) (A C B C C => fl- 4 E n s )> we 
only consider the set of paths from i to j that pass through variable k: 

V\(i=i lt ...,i n =j) EB (=l mi i 

E n(<=l ll .„,i m =fc,...,4„=j) ((fflI""T lm ^ l!+ i) ffl (fflr=m m *ii|+i)) 

= fn< i =, 11 ... )<B = fc >(fflr=>i,i, +1 ))ffl 
(ri(fe=i 1 ,...,...,j m =j)(EBi=i m i!i(+i)) • 

The last equality comes from Hypothesis [T]4 thanks to Vi,j, 

7(ril<n, (i=i lt ...,i n =j) EB (=1 m iiii + i) 7^ 0i 

To obtain the result, we apply the recurrence hypothesis to and mL. 



Remark: the restricted distributivity of n over EH is crucial in the proof of 
this lemma. 

Lemma 4: if 3i, £ 7(m*), then T(m) = 0. □ 

Proof. Suppose that for some i, ^ 7(m*). This means that Vx; £ I, 
x i -x l £ 7(m*), so T(m*) = 0. By Thcorem[H]l, we get T(m) = 0. ■ 

Lemma 5: if Vi, £ 7(111*), then Vi, j, 

(n<i=ii,...,»„=i> EBr=i lm iiii+i) = n <l=li !ln =i> fflr= 1- 



simple path 



□ 



Proof. The C part of the equality is a direct consequence of the fact that 
n is C-anti-monotonic for elements of V(C). 

For the □ part, we prove that, for each path with at least one cycle in it, 
there exists a path with one simple cycle less which has a smaller EH sum. 
Let {i = ii, . . . ,i s , . . . ,i t = i s , . . . ,i n = j) be a path and (i s , ...,i t = i s ) 
a simple cycle in it. By Lemma 1, EB*=s m Mi+i — m u - By hypothesis, 
we have £ 7(m£). Thus, £ 7(3 ^m 4i and ([J H=i™-ii ii+i) E 

ffl (=1 m U il + l ) ( ffl f=t m il ii + l 



Lemma 6: if r(m) 7^ 0, then Vi,j, m* = n<i =il ,...,i n =j) EB Li m H »i +I > 
Vi, j, /c, m* C m* EH m* , and m** = m*. □ 

Proof. Suppose that -T(m) ^ 0. By Lemma 2, Vi,j, 
T(rii<. n , < i =i 1 ,...,i„=j) fflr=i lm i ; ^+i) 0- Thus > wc can a PP!y 

Lemma 1 and 3 to get ...,»„=:,•> ffl "=i lm iH;+i E m ** E 



n {i=ilt ... fiB=j) eb"=i m <iii+i- 

simple path 

By Lemma 4, Vi, £ 7(1x1*). Thus, we can apply Lemma 5 to get Vi,j, 

m tj = V\(i=i 1 ,...,i rl =j) ffl {=1 m ii ij+i = I"! _ ix < n = j) ffl J=l m *i »!+!• 

simple path 

Applying a method similar to the one used in Lemma 3, we get: Vi,j,k, 

m ij = ri(i=ii,...,i„=j) BB i=l m ili!+l 



E n( J :=i 1 ,...,i m =fc,...,i„=j) ffl ?=i m 



1 n-l. 



m^ ffl m, . 



Using Vi,j,k, m* C m* EH m* in the definition of m**, we get, by 
recurrence Vi,j,k (m*)^ 1 = (m*)^-. So, m** = m*. ■ 



• Lemma 7: if r(m) = 0, then Hi, ^ 7(m*). □ 

Proof. We prove this property by recurrence on the size N of the matrix. 
If TV = 1, we have obviously r(m) = {(0)} ■<=>■ G 7(11100), and r(m) = 
-<=^ ^ 7(m o)- By definition, we have m* = moo n (moo EH m oo), so 
0e 7 (moo) ^ 0e 7 (m*). 

Suppose the property is true for some N. Let m be a matrix of size N +1 
such that Vi, G 7(111*), we prove that i"(m) ^ 0. Let m' be the matrix 
of size N constructed as follows: Vi, j < N, m - ■ = m(, +1 ) n(m( i+1 ) EH 
m o(i+i))- We have Vi, j, m^ = mj i+1) soVi,j, m'* = m* +1) 

We deduce that Vi, G 7(1x1'*) and, by recurrence hypothesis, -T(m') ^ 0. 
Let us take (xi, . . . , xn) G -T(m'). VI < i, j, Xj — Xi G 7(111'-.^ C 

7K)- 

Let us prove that we can choose xq such that Vi, xq — xi G 7(m;o), and 
Xi — xq G 7(moi). This will prove that (0, x\ — xq, . . . , xjv — xq) G _T(m), 
and so T(m) ^ 0. 

First remark that Xi — xq G 7(moi) xo — 3;./ G 7(Bmo;) 

.To — Xi G 7(mj ). Consider the set C = j(\~\ 1<i ({xi}Sm i0 )). Then C ^ 0, 

or else, by Hypothesis [TJ3 there exists i, j > 1 such that 7((xf EH rrijo) n 
(x!j fflrrijo)) = 0, that is to say Xj—Xi £ 7(mjol(Bmjo)) = 7(m,;offlm j), 
which is absurd because Xj — Xi G 7(m'*_ 1 ^ y-i)) Q 7( m (i-i) (j-i)) — 
7(m,o EH m oj)- So C is not empty and we simply choose any xo G C . I 

Remark: the fact that we can represent singletons, and the stability of EH 
are crucial in the proof of this lemma. 

• Claim: if r(m) ^ 0, then Vio 7^ jo and c G 7(m* J - o ), there exists 
(xo, . . . , xjv-i) G r(m) such that Xj — Xj a = c. □ 

Proof. By recurrence on N. 

The case N = 1 is not of interest. 

When N = 2 and T(m) 7^ 0, r(m) = T(m*) = { (x ,Xi) | x = 
0, Xi — xo G m* }. Wc can choose, without loss of generality, io = 0, 
jo = 1, so c G 7(111*). Then, the property is obvious. 
Suppose the property is true for some N > 1 and let m be a matrix 
of size N + 1 with non-empty domain. We suppose also, without loss of 
generality, that io, jo > (N + 1 > 2, so one can easily ensure io, jo > 
using a simple variable permutation). We construct m' of size N as in 
Lemma 7: Vi, j < N, m' tJ = m (i+1) n (m (m) EH m 0(j+ i)). Recall 
that Vi, j, m'* = m* +1) so, in particular, c G 7(m'*-i j„-i)- 

Applying the recurrence hypothesis to m', there exists (xi,...,xjv) G 
r(m') such that Xj — Xi G c. Then, we can find Xo, as in Lemma 7, such 
that (0, X\ — xq, . . . , xjv — xq) G r(m) which ends the proof. ■ 



